AITopics | textual label

Collaborating Authors

textual label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

96671501524948bc3937b4b30d0e57b9-Supplemental.pdf

Neural Information Processing SystemsFeb-19-2026, 04:32:25 GMT

relevance, relevance score, textual label, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.58)

Add feedback

6 Appendix

Neural Information Processing SystemsAug-15-2025, 05:34:52 GMT

As described in 3, the MemRecall is the process to extract the key blocks. We also need "strides" as BM25 is a famous TF-IDF-like information retrieval method. Each block is scored based on the common words with query or textual label. However, the semantic relevance are neglected. Glove is a group of pretrained word representation.

relevance, relevance score, textual label, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.58)

Add feedback

Creating User-steerable Projections with Interactive Semantic Mapping

Oliveira, Artur André, Espadoto, Mateus, Hirata, Roberto Jr., Cesar, Roberto M. Jr., Telea, Alex C.

arXiv.org Artificial IntelligenceJun-19-2025

Dimensionality reduction (DR) techniques map high-dimensional data into lower-dimensional spaces. Yet, current DR techniques are not designed to explore semantic structure that is not directly available in the form of variables or class labels. We introduce a novel user-guided projection framework for image and text data that enables customizable, interpretable, data visualizations via zero-shot classification with Multimodal Large Language Models (MLLMs). We enable users to steer projections dynamically via natural-language guiding prompts, to specify high-level semantic relationships of interest to the users which are not explicitly present in the data dimensions. We evaluate our method across several datasets and show that it not only enhances cluster separation, but also transforms DR into an interactive, user-driven process. Our approach bridges the gap between fully automated DR techniques and human-centered data exploration, offering a flexible and adaptive way to tailor projections to specific analytical needs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.15479

Country:

Europe > Netherlands (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Portugal > Coimbra > Coimbra (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance

Zhang, Yaoyun, Xu, Xuenan, Wu, Mengyue

arXiv.org Artificial IntelligenceDec-23-2024

The video-to-audio (V2A) generation task has drawn attention in the field of multimedia due to the practicality in producing Foley sound. Semantic and temporal conditions are fed to the generation model to indicate sound events and temporal occurrence. Recent studies on synthesizing immersive and synchronized audio are faced with challenges on videos with moving visual presence. The temporal condition is not accurate enough while low-resolution semantic condition exacerbates the problem. To tackle these challenges, we propose Smooth-Foley, a V2A generative model taking semantic guidance from the textual label across the generation to enhance both semantic and temporal alignment in audio. Two adapters are trained to leverage pre-trained text-to-audio generation models. A frame adapter integrates high-resolution frame-wise video features while a temporal adapter integrates temporal conditions obtained from similarities of visual frames and textual labels. The incorporation of semantic guidance from textual labels achieves precise audio-video alignment. We conduct extensive quantitative and qualitative experiments. Results show that Smooth-Foley performs better than existing models on both continuous sound scenarios and general scenarios. With semantic guidance, the audio generated by Smooth-Foley exhibits higher quality and better adherence to physical laws.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.18157

Country: Asia > China > Shanghai > Shanghai (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

The Solution for Language-Enhanced Image New Category Discovery

Xu, Haonan, Chao, Dian, Wu, Xiangyu, Wan, Zhonghua, Yang, Yang

arXiv.org Artificial IntelligenceJul-6-2024

Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on textual labels to store visual information is insufficient for representing the diversity of visual objects. In this paper, we propose reversing the training process of CLIP and introducing the concept of Pseudo Visual Prompts. These prompts are initialized for each object category and pre-trained on large-scale, low-cost sentence data generated by large language models. This process mines the aligned visual information in CLIP and stores it in class-specific visual prompts. We then employ contrastive learning to transfer the stored visual information to the textual labels, enhancing their visual representation capacity. Additionally, we introduce a dual-adapter module that simultaneously leverages knowledge from the original CLIP and new learning knowledge derived from downstream datasets. Benefiting from the pseudo visual prompts, our method surpasses the state-of-the-art not only on clean annotated text data but also on pseudo text data generated by large language models.

category, text prompt, visual prompt, (14 more...)

arXiv.org Artificial Intelligence

2407.04994

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Better Few-Shot Relation Extraction with Label Prompt Dropout

Zhang, Peiyuan, Lu, Wei

arXiv.org Artificial IntelligenceOct-24-2022

Few-shot relation extraction aims to learn to identify the relation between two entities based on very limited training examples. Recent efforts found that textual labels (i.e., relation names and relation descriptions) could be extremely useful for learning class representations, which will benefit the few-shot learning task. However, what is the best way to leverage such label information in the learning process is an important research question. Existing works largely assume such textual labels are always present during both learning and prediction. In this work, we argue that such approaches may not always lead to optimal results. Instead, we present a novel approach called label prompt dropout, which randomly removes label descriptions in the learning process. Our experiments show that our approach is able to lead to improved class representations, yielding significantly better results on the few-shot relation extraction task.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.13733

Country:

Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

Zero-Shot Audio Classification Based on Class Label Embeddings

Xie, Huang, Virtanen, Tuomas

arXiv.org Machine LearningMay-6-2019

This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes.

class label, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

1905.01926

Country: Europe > Finland (0.15)

Genre: Research Report (0.83)

Industry: Media (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

A Preliminary Analysis and Catalog of Thematic Labels

Wagner, Earl J. (University of Maryland, College Park)

AAAI ConferencesNov-5-2010

An account of the labels commonly used to express themes could both help in assessing the coverage of models of narrative processing, and support recognizing themes by the textual appearance of these labels. This paper presents a preliminary analysis and catalog of thematic labels such as “vicious cycle” and “underdog”. In contrast to a top-down approach characterizing themes in terms of components of a model of narrative processing, a bottom-up approach is taken. Thematic labels are gathered independent of any particular model and they are catalogued according to the types of relationships the corresponding themes convey.

artificial intelligence, narrative, thematic label, (14 more...)

AAAI Conferences

2010 AAAI Fall Symposium Series

Country:

North America > United States > Maryland > Prince George's County > College Park (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.72)

Add feedback